
Access to the data used in this project can be requested online at https://www.stats.govt.nz/integrated-data/apply-to-use-microdata-for-research/. 
Statistics NZ can also be contacted for more information at +6449314253 or access2microdata@stats.govt.nz. 
At the moment, access is only possible at an approved datalab facility in New Zealand, but this could change in the future. 
Access to LBD data can currently only be granted to NZ government researchers. 

All analysis was run using Stata MP Version 14.2 in a Window 10 environment. 
The ado-file GRR.ado is provided by Statistics NZ to aid in confidentializing output data.

The following raw database files should be requested from Statistics NZ
a) Complete Linked Employer-Employee Data from the IDI database 
b) Longitudinal Business Database
c) Annual Business Operations Survey 
d) Quarterly Household Labor Force Survey linked to the IDI 

The following auxiliary datasets are provided and need to be put in the working folder
a) anz_map.dta
b) CPI.dta

*******************************************************************************;
Analysis Datasets are created using the following code
*******************************************************************************;

Step 1: Data extraction using SQL from Statistics NZ databases. 

Unfortunately, Statistics NZ archives old static copies of the IDI database files, meaning they are unavailable to researchers. 
Some derived data tables provided by Statistics NZ change name and location as updates occur, and old versions become unavailable.  
The following SQL code will thus likely need to be modified to run on the data that is provided to any researcher. 
We will help with this process to whatever extent is possible.    

(1) Wgap_extr_v3b_ed.sql extracts from the IDI database

a) firm productivity data 
b) firm permanent industry and business type and institutional sector
c) main employee IDI data set
d) firm level IDI/LBD data sets
e) firm multiplant indicator
f) information need for two-way fixed effects (FE) estimation

(2) Wgap_BOSextr_v3_edclean.sql extract data from the BOS database

(3) Wgap_HLFS_IDIextr_v3_ed.sql extracts data from the HLFS database

Step 2: Setting up the data and running two-way fixed effects (FEs) model 

(1) Run GWgap_2wFE_extract_ed.do - creates datasets used for the estimation of two-way FEs models
(2) Run GWgap_2wFE_extract_v4_ed.do - estimates two-way FEs models
(3) Run GWgap_2wFE_fagg_v4_ed.do - processes results from two-way FEs models

Step 3: Creation of analysis datasets
(1) Run GWgap_readin_v3b_ed.do 
(2) Run GWgap_clean_ind_v3_ed.do 
(3) Run GWgap_compet_v3_ed
(4) Run GWgap_postreadin_v4_ed

*******************************************************************************;
Analysis Datasets used to Produce Paper Tables and Figures
*******************************************************************************;

(1) GWgap_pr_IDI_v4: Individual level dataset, Full Sample, Only IDI Information

snz_uid			Individual ID Code                	
female			Female	
year			March year	
hp_pent			ID Code for Enterprise (pent) of Highest Paying Job in year	
max_gross_earn_yr	Individual's wage/salary earnings from highest-paying pent in year	
max_fte_employee_av	Average monthly non-WP employee FTEs of employee at hp pent	
max_mon_wkd		Number of calendar months worked at highest paying pent	
num_pents		Number of pents worked at during year	
age			Age	
wkd_hpp_1ya		Worked for highest paying pent in previous year	
wkd_hpp_2ya		Worked for highest paying pent for last 2 yrs	
anz06_4d		Industry	
multiplant		Firm had multiple plants in 2+ months	
L_hc			Average non-WP employee head count at pent	
CPI			(mean) CPI	
lnWpm_hp		Average wage/salary earnings per month worked at hp pent (ln real)	
indiv_avfte_hp		Average FTEs at hp pent in each month worked at hp pent	
num_pents_cat		Number of pents worked at during year (categories)	
full_yr			Worked 12 months of year	
full_yr_hp		Worked 12 months of year at hp pent	
ind4			4-digit ANZSIC industry	
age_cat			Age category	
ffe_m			Firm fixed effect calculated from males only	
ffe_f			Firm fixed effect calculated from females only	
ffe_t			Firm fixed effect calculated from both genders	
wfe_m			Worker fixed effect calculated from males only	
wfe_f			Worker fixed effect calculated from females only	
wfe_t			Worker fixed effect calculated from both genders	
Q_ffe			Sample with non-missing male and female firm FE	
Q_inprod		Sample in the productivity data	
Q_gprod			Sample with good productivity data	
Q_hc5			Indicator for firm has a head count of at least 5	
Q_rest			Indicator for restricted sample	
lnIDI_earnhp		Log real annual earnings from highest-paying pent (IDI)	

(2) GWgap_IDI_indivregdata: Individual level dataset, Main Analysis Sample, Only IDI Information

snz_uid	                Individual ID Code
female			Female
year			March year
hp_pent			ID Code for Enterprise (pent) of Highest Paying Job in year
tot_gross_earn_yr	Individual's total wage/salary earnings for year
max_gross_earn_yr	Individual's wage/salary earnings from highest-paying pent in year
tot_fte_employee_av	Average monthly non-WP employee FTEs of employee
max_fte_employee_av	Average monthly non-WP employee FTEs of employee at hp pent
tot_mon_wkd		Total number of calendar months worked
max_mon_wkd		Number of calendar months worked at highest paying pent
num_pents		Number of pents worked at during year
age			Age
wkd_hpp_1ya		Worked for highest paying pent in previous year
wkd_hpp_2ya		Worked for highest paying pent for last 2 yrs
anz06_4d		Industry
multiplant		Firm had multiple plants in 2+ months
L_hc			Average non-WP employee head count at pent
s_f			Female share of pent head count
in_prody_dat		pent-year is in productivity data
CPI			(mean) CPI
lnWpm			Average wage/salary earnings per month worked (ln real)
lnWpm_hp		Average wage/salary earnings per month worked at hp pent (ln real)
indiv_avfte		Average FTEs in each month worked
indiv_avfte_hp		Average FTEs at hp pent in each month worked at hp pent
num_pents_cat		Number of pents worked at during year (categories)
full_yr			Worked 12 months of year
full_yr_hp		Worked 12 months of year at hp pent
lnL_hc			Pent average head count (ln)
lnL_fte			Pent average FTE (ln)
ind4			4-digit ANZSIC industry
age_cat			Age category

(3) GWgap_pr_HLFS_v4: Individual level dataset, Main Analysis Sample Matched to HLFS

snz_uid			Individual ID Code
female			Female
year			March year
hp_pent			ID Code for Enterprise (pent) of Highest Paying Job in year
snz_ird_uid		Individual ID Code (IRD)
max_gross_earn_yr	Individual's wage/salary earnings from highest-paying pent in year
WP			Ever a working proprietor at any pent
max_fte_employee_av	Average monthly non-WP employee FTEs of employee at hp pent
max_mon_wkd		Number of calendar months worked at highest paying pent
max_mon_wkd_ft		Number of calendar months worked full time at highest paying pent
max_mon_wkd_pt		Number of calendar months worked part time at highest paying pent
num_pents		Number of pents worked at during year
age			Age
wkd_hpp_1ya		Worked for highest paying pent in previous year
wkd_hpp_2ya		Worked for highest paying pent for last 2 yrs
anz06_4d		Industry
multiplant		Firm had multiple plants in 2+ months
L_fte			Non-WP employee FTEs
L_hc			Average non-WP employee head count at pent
L_hc_ft			Average ft non-WP employee head count at pent
L_hc_pt			Average pt non-WP employee head count at pent
av_fte_m		Average monthly FTEs of m in the pent
av_fte_f		Average monthly FTEs of f in the pent
s_f			Female share of pent head count
s_f_fte			Female share of pent FTEs
L_WP			Working proprietor head count
rand			stable random number for observation
CPI			(mean) CPI
lnWpm_hp		Average wage/salary earnings per month worked at hp pent (ln real)
indiv_avfte_hp		Average FTEs at hp pent in each month worked at hp pent
num_pents_cat		Number of pents worked at during year (categories)
full_yr			Worked 12 months of year
full_yr_hp		Worked 12 months of year at hp pent
ft_hp			Fraction of employee's months at hp pent worked full time
lnL_hc			Pent average head count (ln)
WP_cat			Working proprietor headcount (categories)
ind4			4-digit ANZSIC industry
ind1			1-digit ANZSIC industry
ind2			2-digit ANZSIC industry
ind3			3-digit ANZSIC industry
female_HLFS		Female
par_u18_ct_HLFS		Parent of number of dependent children (max in the year)
multi_job_HLFS		Ever reported multiple jobs in HLFS in the year
ft_HLFS			Ever worked full time in the year
nz_born_HLFS		New Zealand born
hhcomp_HLFS		Household composition
age_HLFS		Average age in the year
hrs_main_HLFS		Average hours in main job
hrs_oth_HLFS		Average hours in non-main jobs
hrs_tot_HLFS		Average total hours in all jobs
occ2d_HLFS		Occupation (2-digit ANZSCO 2006, first in year)
occ3d_HLFS		Occupation (3-digit ANZSCO 2006, first in year)
ind2_HLFS		Industry (ANZSIC 2006 level 2, first in year)
rc_HLFS			Regional Council (first in year)
HLFS_wgt_HLFS		Sum of HLFS weights in included quarters
num_quart_HLFS		Number of quarters included from HLFS data
age_cat			Age category
pf_ind			Productivity industry of highest paying pent from IDI, firms in prody data only
M_nom			Materials
K_nom			Capital
go_nom			Gross Output
K_real			Real K
lnK			K (ln real 2006 $ using CPI)
M_real			Real M
lnM			M (ln real 2006 $ using CPI)
go_real			Real Gross Output
lngo			go (ln real 2006 $ using CPI)
o400_ch			M_real, K_real, or go_real changed more than 400% from prev yr
o999p			M_real, K_real, or go_real is over 99.9th percentile
Q_ffe			Sample with non-missing male and female firm FE
Q_inprod		Sample in the productivity data
Q_gprod			Sample with good productivity data
Q_hc5			Indicator for firm has a head count of at least 5
Q_rest			Indicator for restricted sample
eth_eur			European ethnicity
eth_mao			Maori ethnicity
eth_pac			Pacific ethnicity
eth_asi			Asian ethnicity
eth_mel			Middle Eastern/Latin American/African ethnicity
eth_oth			Other ethnicity
eth_gp			Ethnicity combination
hp_ffe_m		Firm fixed effect for highest paying pent estimated using men only
hp_ffe_f		Firm fixed effect for highest paying pent estimated using women only
hp_ffe_t		Firm fixed effect for highest paying pent estimated using both genders
lnIDI_earnhp		Log real annual earnings from highest-paying pent (IDI)
age2			Age squared (/100)
hqual			Highest qualification
numkid_u18_cat		Number of dependent children parented
KL_rat			K/L ratio (/100,000)
val_ad_pw		Value added per worker (real $0,000s)
hc5pl			Highest paying pent has head count of 5 or more
nonmsgb			All controls used in 6 columns as non-missing
nonmsg			All controls used in 9 columns as non-missing
Q_rest_HLFS		In restricted sample and HLFS controls are non-missing indicated variables have notes
		
(4) GWgap_pr_firm_v4: Firm Level Dataset 

pent			ID Code for Enterprise (pent) of Highest Paying Job in year
year			Pent's financial year
months_empl		Number of months pent employed any non-WP
fte_t			Total FTE
L_hc_m			Male employee head count (avg)
L_hc_f			Female employee head count (avg)
L_hc_t			Total employee head count (avg)
L_fte_m			Total male FTEs
L_fte_f			Total female FTEs
L_fte_t			Total FTE
av_fte_m		Average male FTEs
av_fte_f		Average female FTEs
L_hc_m_ft		Full time male employee hc
L_hc_f_ft		Full time female employee hc
L_hc_m_pt		Part time male employee hc
L_hc_f_pt		Part time female employee hc
L_hc_lt25		Employee head count aged lt25
av_fte_lt25		Average FTEs of workers aged lt25
L_hc_25to39		Employee head count aged 25to39
av_fte_25to39		Average FTEs of workers aged 25to39
L_hc_40to54		Employee head count aged 40to54
av_fte_40to54		Average FTEs of workers aged 40to54
L_hc_55p		Employee head count aged 55p
av_fte_55p		Average FTEs of workers aged 55p
L_hc_lt25_ft		Employee head count ft aged lt25
L_hc_lt25_pt		Employee head count pt aged lt25
L_hc_25to39_ft		Employee head count ft aged 25to39
L_hc_25to39_pt		Employee head count pt aged 25to39
L_hc_40to54_ft		Employee head count ft aged 40to54
L_hc_40to54_pt		Employee head count pt aged 40to54
L_hc_55p_ft		Employee head count ft aged 55p
L_hc_55p_pt		Employee head count pt aged 55p
L_hc_ten0		Employee head count with tenure ten0
av_fte_ten0		Average FTEs of workers with tenure ten0
L_hc_ten1		Employee head count with tenure ten1
av_fte_ten1		Average FTEs of workers with tenure ten1
L_hc_ten2		Employee head count with tenure ten2
av_fte_ten2		Average FTEs of workers with tenure ten2
L_hc_ten0_ft		Employee head count ft with tenure ten0
L_hc_ten0_pt		Employee head count pt with tenure ten0
L_hc_ten1_ft		Employee head count ft with tenure ten1
L_hc_ten1_pt		Employee head count pt with tenure ten1
L_hc_ten2_ft		Employee head count ft with tenure ten2
L_hc_ten2_pt		Employee head count pt with tenure ten2
WP_t			Working proprietor head count
WP_cat			Working proprietor headcount (categories)
deflator		CPI deflator, multiply to convert to 2006 dollars
lnWB			Total wage bill (ln real)
multiplant		Firm had multiple plants in 2+ months
pf_ind			Industry
anz06_4d		Level 4 industry
M_nom			Materials
K_nom			Capital
go_nom			Gross Output
K_real			Real Capital
lnK			K (ln real 2006 $ using CPI)
M_real			Real Materials
lnM			M (ln real 2006 $ using CPI)
go_real			Real Gross Output
lngo			go (ln real 2006 $ using CPI)
o400_ch_K		K_real changed more than 400% from prev yr
o400_ch			M_real, K_real, or go_real changed more than 400% from
o99p_K			K_real is over 99th percentile for firms with hc>=5
o999p			M_real, K_real, or go_real is over 99.9th percentile
in_prody		In productivity data
ind2			2-digit ANZSIC06 industry
pf_ind_gp		Productivity industry group code
rand			Stable random number
Yyear_2003		year==2003
Yyear_2004		year==2004
Yyear_2005		year==2005
Yyear_2006		year==2006
Yyear_2007		year==2007
Yyear_2008		year==2008
Yyear_2009		year==2009
Yyear_2010		year==2010
Yyear_2011		year==2011
Yyear_2012		year==2012
Yyear_2013		year==2013
Yyear_2014		year==2014
Yyear_2015		year==2015
Yyear_2016		year==2016
INpf_ind_gp_2		pf_ind_gp==2
WPc_WP_cat_1		WP_cat==1
WPc_WP_cat_2		WP_cat==2
WPc_WP_cat_3		WP_cat==3
WPc_WP_cat_4		WP_cat==4
L_hc_m_lt25		Male employee head count aged lt25
av_fte_m_lt25		Average FTEs of males aged lt25
L_hc_f_lt25		Female employee head count aged lt25
av_fte_f_lt25		Average FTEs of females aged lt25
L_hc_m_25to39		Male employee head count aged 25to39
av_fte_m_25to39		Average FTEs of males aged 25to39
L_hc_f_25to39		Female employee head count aged 25to39
av_fte_f_25to39		Average FTEs of females aged 25to39
L_hc_m_40to54		Male employee head count aged 40to54
av_fte_m_40to54		Average FTEs of males aged 40to54
L_hc_f_40to54		Female employee head count aged 40to54
av_fte_f_40to54		Average FTEs of females aged 40to54
L_hc_m_55p		Male employee head count aged 55p
av_fte_m_55p		Average FTEs of males aged 55p
L_hc_f_55p		Female employee head count aged 55p
av_fte_f_55p		Average FTEs of females aged 55p
L_hc_m_lt25_ft		Male employee head count ft aged lt25
L_hc_f_lt25_ft		Female employee head count ft aged lt25
L_hc_m_lt25_pt		Male employee head count pt aged lt25
L_hc_f_lt25_pt		Female employee head count pt aged lt25
L_hc_m_25to39_ft	Male employee head count ft aged 25to39
L_hc_f_25to39_ft	Female employee head count ft aged 25to39
L_hc_m_25to39_pt	Male employee head count pt aged 25to39
L_hc_f_25to39_pt	Female employee head count pt aged 25to39
L_hc_m_40to54_ft	Male employee head count ft aged 40to54
L_hc_f_40to54_ft	Female employee head count ft aged 40to54
L_hc_m_40to54_pt	Male employee head count pt aged 40to54
L_hc_f_40to54_pt	Female employee head count pt aged 40to54
L_hc_m_55p_ft		Male employee head count ft aged 55p
L_hc_f_55p_ft		Female employee head count ft aged 55p
L_hc_m_55p_pt		Male employee head count pt aged 55p
L_hc_f_55p_pt		Female employee head count pt aged 55p
ffe_m			CCK firm FE calculated using males only
ffe_f			CCK firm FE calculated using females only
ffe_t			CCK firm FE calculated using both genders
lnKsq			lnK squared
lnK_lnM			lnK * lnM
lnMsq			lnM squared
Q_ffe			Sample with non-missing male and female firm FE
Q_inprod		Sample in the productivity data
Q_gprod			Sample with good productivity data
Q_hc5			Indicator for firm has a head count of at least 5
Q_rest			Indicator for restricted sample

*******************************************************************************;
* Stata files to Produce Paper Tables and Figures
*******************************************************************************;

* Population counts: GWgapr10_popcts.do

* Table 1: GWgapr10_indivIDIb_ed.do

* Table 2: GWgapr10_indivHLFSb_ed.do

* Table 3 and Appendix Table 5: 
a) GWgapr10_CCKnorm_ed.do
b) GWgapr10_CCKdecomp_ed.do 
c) GWgapr10_CCKIV_ed.do

These produce three output files which need to be inputted by hand to Figure1_2.xls
a) GWgapr10_CCKnorm_SUR.xlsx
b) GWgapr10_CCKnorm_fig2.xlsx
c) GWgapr10_CCKnorm_for_fig2.dta

*Figures 1 and 2: Figure1_2.xls

* Table 4: GWgapr10_pfpool_anz2d_TLsur_ed.do

* Table 5:

a) GWgapr10_pfindyr_TL_ed.do
b) GWgapr10_indyreg_pca_allobs_ed.do
c) GWgapr10_industry_year_regressions_ed.do

* Appendix Table 1: GWgapr10_desc_ed.do

* Appendix Table 2: GWgapr10_descIDI_ed

* Appendix Table 3:

a) GWgapr10_pfpool_anz2d_TLsur_noindFE_ed.do
b) GWgapr10_pfpool_anz2d_TLsur_loWP_ed.do
c) GWgapr10_pfpool_anz2d_TLsur_VA_ed.do
d) GWgapr10_pfpool_anz2d_TLsur_Gallen_ed.do
e) GWgapr10_pfpool_anz2d_TLsur_Gallen_nonneg_ed.do
f) GWgapr10_pfpool_anz2d_TLsur_ftindy_ed.do

* Appendix Table 4: GWgapr10_indivHLFS_het_ed.do

* Appendix Table 6: 

a) GWgapr10_pfpool_TLsur_LgXage_ed.do
b) GWgapr10_pfpool_TLsur_LgXten_ed.do

* Appendix Figure 1: GWgapr10_desc_byyr_ed.do 

This outputs GWgapr10_desc_byyr.xlsx which then needs to be copied by hand to
Appendix Figure 1.xlsx

* Appendix Figures 2 and 3

a) GWgapr10_pfbyind_TL_ed.do
b) GWgapr10_pfbyind_TL_plot_ed.do

